在最近的地理空间研究中,通过自我监督学习建模大规模人类流动性数据的重要性与使用大型语料库的自我监督方法驱动的自然语言处理的进展并行。然而,已经有很多可行的方法适用于地理空间序列建模本身,似乎在评估方面似乎是改进的空间,特别是如何测量生成和参考序列之间的相似性。在这项工作中,我们提出了一种新颖的相似性测量,Geo-Bleu,这在地理空间序列建模和生成的背景下可能特别有用。顾名思义,这项工作是基于Bleu,是机器翻译研究中最受欢迎的措施之一,同时引入了空间接近N-Gram的想法。我们将此措施与已建立的基线进行比较,动态时间翘曲,将其应用于实际生成的地理空间序列。使用众群注释数据,关于从12,000例患者收集的地理空间序列之间的相似性,我们定量和定性地显示了所提出的方法的优势。
translated by 谷歌翻译
量化城市道路网络(URNS)不同部分的拓扑相似之处使我们能够了解城市成长模式。虽然传统统计信息提供有关单个节点的直接邻居或整个网络的特性的有用信息,但是这种度量无法衡量考虑本地间接邻域关系的子网的相似性。在这项研究中,我们提出了一种基于图的机器学习方法来量化子网的空间均匀性。我们将该方法应用于全球30个城市的11,790个城市道路网络,以衡量每个城市和不同城市的道路网络的空间均匀性。我们发现,城市内的空间均匀性与诸如GDP和人口增长的社会经济地位高度相关。此外,通过在不同城市转移模型获得的城市间空间均匀性揭示了欧洲的城市网络结构的城市网络结构间相似性,传递给美国和亚洲的城市。可以利用使用我们的方法揭示的社会经济发展和城市间相似性,以了解和转移城市的洞察力。它还使我们能够解决城市政策挑战,包括在迅速城市化地区的网络规划,并打击区域不平等。
translated by 谷歌翻译
In various fields of data science, researchers are often interested in estimating the ratio of conditional expectation functions (CEFR). Specifically in causal inference problems, it is sometimes natural to consider ratio-based treatment effects, such as odds ratios and hazard ratios, and even difference-based treatment effects are identified as CEFR in some empirically relevant settings. This chapter develops the general framework for estimation and inference on CEFR, which allows the use of flexible machine learning for infinite-dimensional nuisance parameters. In the first stage of the framework, the orthogonal signals are constructed using debiased machine learning techniques to mitigate the negative impacts of the regularization bias in the nuisance estimates on the target estimates. The signals are then combined with a novel series estimator tailored for CEFR. We derive the pointwise and uniform asymptotic results for estimation and inference on CEFR, including the validity of the Gaussian bootstrap, and provide low-level sufficient conditions to apply the proposed framework to some specific examples. We demonstrate the finite-sample performance of the series estimator constructed under the proposed framework by numerical simulations. Finally, we apply the proposed method to estimate the causal effect of the 401(k) program on household assets.
translated by 谷歌翻译
Background and objective: COVID-19 and its variants have caused significant disruptions in over 200 countries and regions worldwide, affecting the health and lives of billions of people. Detecting COVID-19 from chest X-Ray (CXR) images has become one of the fastest and easiest methods for detecting COVID-19 since the common occurrence of radiological pneumonia findings in COVID-19 patients. We present a novel high-accuracy COVID-19 detection method that uses CXR images. Methods: Our method consists of two phases. One is self-supervised learning-based pertaining; the other is batch knowledge ensembling-based fine-tuning. Self-supervised learning-based pretraining can learn distinguished representations from CXR images without manually annotated labels. On the other hand, batch knowledge ensembling-based fine-tuning can utilize category knowledge of images in a batch according to their visual feature similarities to improve detection performance. Unlike our previous implementation, we introduce batch knowledge ensembling into the fine-tuning phase, reducing the memory used in self-supervised learning and improving COVID-19 detection accuracy. Results: On two public COVID-19 CXR datasets, namely, a large dataset and an unbalanced dataset, our method exhibited promising COVID-19 detection performance. Our method maintains high detection accuracy even when annotated CXR training images are reduced significantly (e.g., using only 10% of the original dataset). In addition, our method is insensitive to changes in hyperparameters. Conclusions: The proposed method outperforms other state-of-the-art COVID-19 detection methods in different settings. Our method can reduce the workloads of healthcare providers and radiologists.
translated by 谷歌翻译
Purpose: Considering several patients screened due to COVID-19 pandemic, computer-aided detection has strong potential in assisting clinical workflow efficiency and reducing the incidence of infections among radiologists and healthcare providers. Since many confirmed COVID-19 cases present radiological findings of pneumonia, radiologic examinations can be useful for fast detection. Therefore, chest radiography can be used to fast screen COVID-19 during the patient triage, thereby determining the priority of patient's care to help saturated medical facilities in a pandemic situation. Methods: In this paper, we propose a new learning scheme called self-supervised transfer learning for detecting COVID-19 from chest X-ray (CXR) images. We compared six self-supervised learning (SSL) methods (Cross, BYOL, SimSiam, SimCLR, PIRL-jigsaw, and PIRL-rotation) with the proposed method. Additionally, we compared six pretrained DCNNs (ResNet18, ResNet50, ResNet101, CheXNet, DenseNet201, and InceptionV3) with the proposed method. We provide quantitative evaluation on the largest open COVID-19 CXR dataset and qualitative results for visual inspection. Results: Our method achieved a harmonic mean (HM) score of 0.985, AUC of 0.999, and four-class accuracy of 0.953. We also used the visualization technique Grad-CAM++ to generate visual explanations of different classes of CXR images with the proposed method to increase the interpretability. Conclusions: Our method shows that the knowledge learned from natural images using transfer learning is beneficial for SSL of the CXR images and boosts the performance of representation learning for COVID-19 detection. Our method promises to reduce the incidence of infections among radiologists and healthcare providers.
translated by 谷歌翻译
This paper solves a generalized version of the problem of multi-source model adaptation for semantic segmentation. Model adaptation is proposed as a new domain adaptation problem which requires access to a pre-trained model instead of data for the source domain. A general multi-source setting of model adaptation assumes strictly that each source domain shares a common label space with the target domain. As a relaxation, we allow the label space of each source domain to be a subset of that of the target domain and require the union of the source-domain label spaces to be equal to the target-domain label space. For the new setting named union-set multi-source model adaptation, we propose a method with a novel learning strategy named model-invariant feature learning, which takes full advantage of the diverse characteristics of the source-domain models, thereby improving the generalization in the target domain. We conduct extensive experiments in various adaptation settings to show the superiority of our method. The code is available at https://github.com/lzy7976/union-set-model-adaptation.
translated by 谷歌翻译
数据集复杂性评估旨在在训练分类器之前先预测具有复杂性计算的数据集上的分类性能,该分类器也可以用于分类器选择和减少数据集。深卷积神经网络(DCNN)的训练过程是迭代的且耗时的,这是由于高参数的不确定性和不同数据集引入的域移位。因此,通过在培训DCNN模型之前有效评估数据集的复杂性来预测分类性能是有意义的。本文提出了一种新的方法,称为Laplacian Spectrum(CMSAUL)下的累积最大缩放区域,该方法可以在六个数据集上实现最新的复杂性评估性能。
translated by 谷歌翻译
背景和目标:需要分享医疗数据以实现医疗保健信息的跨机构流量并构建高准确的计算机辅助诊断系统。但是,大量的医疗数据集,保存深度卷积神经网络(DCNN)模型的大量记忆以及患者的隐私保护是可能导致医疗数据共享效率低下的问题。因此,本研究提出了一种新型的软标签数据集蒸馏方法,用于医疗数据共享。方法:所提出的方法提炼医疗图像数据的有效信息,并生成几个带有不同数据分布的压缩图像,以供匿名医疗数据共享。此外,我们的方法可以提取DCNN模型的基本权重,以减少保存训练有素的模型以进行有效的医疗数据共享所需的内存。结果:所提出的方法可以将数万张图像压缩为几个软标签图像,并将受过训练的模型的大小减少到其原始大小的几百分之一。蒸馏后获得的压缩图像已在视觉上匿名化;因此,它们不包含患者的私人信息。此外,我们可以通过少量压缩图像实现高检测性能。结论:实验结果表明,所提出的方法可以提高医疗数据共享的效率和安全性。
translated by 谷歌翻译
高级模型的采集取决于许多领域的大型数据集,这使存储数据集和培训模型昂贵。作为解决方案,数据集蒸馏可以合成一个小数据集,以便在其上训练有素的模型在与原始大型数据集的情况下达到高性能。通过匹配网络参数的最近提出的数据集蒸馏方法已被证明对多个数据集有效。但是,蒸馏过程中的一些参数很难匹配,这会损害蒸馏性能。基于此观察结果,本文提出了一种使用参数修剪来解决问题的新方法。提出的方法可以通过在蒸馏过程中修剪难以匹配的参数来合成更强大的蒸馏数据集并改善蒸馏性能。三个数据集的实验结果表明,所提出的方法的表现优于其他SOTA数据集蒸馏方法。
translated by 谷歌翻译
由于存在隐私保护问题以及传输和存储许多高分辨率医疗图像的巨大成本,因此在医院之间共享医疗数据集很具有挑战性。但是,数据集蒸馏可以合成一个小数据集,从而使对其进行训练的模型与原始大型数据集实现了可比的性能,这显示了解决现有的医疗共享问题的潜力。因此,本文提出了一种基于数据集蒸馏的新型医学数据集共享方法。Covid-19胸部X射线图像数据集的实验结果表明,即使使用稀缺的匿名胸部X射线图像,我们的方法也可以达到高检测性能。
translated by 谷歌翻译